Amendments to the Claims: 



Listing of Claims: 

1 . (Currently Amended) In a computer data processing system, a method for clustering 
data in a database comprising: 

a) providing a database having a number of data records having both discrete 
and continuous attributes; 

b) grouping together data records from the database which have specified 
discrete attribute configurations; 

c) performing clustering of data records in two phases including a first phase 
and a second phase, the first phase clustering the data records over a discrete attribute 
space, and the second phase clustering continuous attributes the data hav i ng th e sam e 
or s i mi l ar specif i ed d i screte attribut e conf i gurat i on bas e d on the cont i nuous attr i butes 
to produce an intermediate set of data clusters; and 

d) merging together clusters from the intermediate set of data clusters to produce a 
clustering model. 

2. (Original) The method of claim 1 wherein the clustering model includes a table of 
probabilities for the discrete data attributes of the data records for a cluster and wherein 
the cluster model for continuous data attributes comprises a mean and a covariance for 
each cluster. 

3 . (Original) The method of claim 1 wherein the process of merging of intermediate 
clusters is ended when a specified number of clusters has been formed. 
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4. (Original) The method of claim 1 wherein the step of merging of intermediate 
clusters is ended when a distance between intermediate clusters is greater than a 
specified minimum distance. 

5. (Original) The method of claim 1 wherein the discrete attributes are Boolean and 
similarity between configurations is based on a distance between bit patterns of the 
discrete attributes. 

6. (Original) The method of claim 1 wherein one or more of the discrete attributes have 
more than two possible values and comprising the step of subdividing a discrete 
attribute having more than two possible values into multiple Boolean value attributes. 

7. (Original) The method of claim 5 wherein the step of identifying configurations 
includes tabulating data records having the same discrete attribute bit pattern and 
combining the data records from similar configurations before clustering the data 
records so tabulated based on the continuous attributes. 

8. (Currently Amended) In a computer data processing system, a method for clustering 
data in a database comprising: 

a) providing a database having a number of data records having both discrete 
and continuous attributes; 

b) performing a first discrete cluster by counting data records from the database 
which have the same discrete attribute configuration and identifying a first set of 
configurations wherein the number of data records of each configuration of said first set 
of configurations exceeds a threshold number of data records; 

c) adding data records from the database not belonging to one of the first set of 
configurations with a configuration within said first set of configurations to produce a 
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subset of .records from the database belonging to configurations in the first set of 
configurations; and 

d) performing a second continuous clustering of the subset of records contained 
within at least some of the first set of configurations based on the continuous data 
attributes of records contained within that first set of configurations to produce a 
clustering model. 

9. (Original) The method of claim 8 wherein the clustering model includes a table of 
probabilities for the discrete data attributes of the data records for a cluster and wherein 
the cluster model for continuous data attributes comprises a mean and a covariance for 
each cluster. 

1 0 . (Original) The method of claim 8 wherein an added record not contained within the 
first set of configurations is added to one of said first set of configurations based on a 
distance between a smaller configuration to which said added record belongs during 
counting of records in different configurations. 

1 1 . (Original) The method of claim 8 wherein the clustering of records from a 
configuration based on continuous data attributes results in a variable number of 
clusters for each configuration based on the number of records in said configuration. 

1 2. (Original) The method of claim 8 wherein the clustering of records from records 
falling within a configuration of the first set results in a number of intermediate clusters 
which are merged together to form the cluster model. 
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1 3. (Original) The method of claim 1 2 wherein intermediate clusters are merged 
together based on a distance between clusters that is determined based on both 
continuous and discrete attributes of said intermediate clusters. 

1 4. (Original) The method of claim 1 3 wherein the merging of intermediate clusters is 
performed until a specified number of clusters are contained in the cluster model. 

1 5. (Original) The method of claim 1 3 wherein the merging of intermediate clusters is 
performed until a distance between two closest clusters is greater than a threshold 
distance. 

1 6. (Original) The method of claim 8 wherein a list of records of each configuration in 
the first set of configurations is maintained as data records are accessed from the 
database. 

1 7. (Original) The method of claim 8 where the clustering based on the continuous 
attributes of records within a configuration is performed using expectation 
maximization clustering of the continuous attributes. 

1 8. (Currently Amended) A data processing system comprising: 

a) a storage medium for storing a database having a number of data records 
having both discrete and continuous attributes; 

b) a computer for evaluating data records from the database and building a 
clustering model that describes data in the database; and 

c) a database management system including a component for selectively 
retrieving data records from the database for evaluation by the computer; 

d) said computer including a stored program for i) grouping together data 

records from the database which have specified discrete attribute configurations; ii) a 
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first clustering of data records having the same or a similar specified discrete attribute 
configuration based on th e cont i nuous attr i bute s to produc e an i nt e rmed i ate s e t of data 
c l ust e rs ; and iii) a second clustering data records based on the continuous attributes; 
and iv) merging together the first clustering and the second clustering c l ust e rs from the 
i nt e rm e d i at e s e t of data clust e rs to produce a clustering model. 

1 9. (Original) The system of claim 1 8 wherein the computer includes a rapid access 
storage for maintaining a list of data records from the database for data records having 
a specified discrete attribute configuration to facilitate clustering of the data records 
based on their continuous attributes. 

20. (Original) The data processing system of claim 1 8 wherein the database 
management system comprises means for subdividing discrete attributes having more 
than two possible values into multiple Boolean value attributes having two possible 
values. 

21 . (Original) The system of claim 1 8 wherein the rapid access storage of said computer 
includes a data structure for storing a clustering model. 

22. (Currently Amended) A computer readable medium containing stored instructions 
for clustering data in a database comprising instructions for : 

a) reading records from a database having a number of data records having both 
discrete and continuous attributes; 

b) performing a first clustering of grouping tog e ther data records from the 
database which have specified discrete attribute configurations; 

c) performing a second clustering of the clust e r i ng data records having the 

same or similar specified discrete attribute configuration based on the continuous 

attributes to produce an intermediate set of data clusters; and 
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d) merging together clusters from the intermediate set of data clusters to 
produce a clustering model. 

23. (Original) The computer readable medium of claim 22 including instructions for 
maintaining a clustering model that includes a table of probabilities for the discrete data 
attributes of the data records for a cluster and wherein the cluster model for continuous 
data attributes comprises a mean and a covariance for each cluster. 

24 . (Original) The computer readable medium of claim 22 wherein the instructions end 
the process of merging of intermediate clusters when a specified number of clusters has 
been formed. 

25. (Original) The computer readable medium of claim 22 wherein the instructions end 
the process of merging intermediate clusters when a distance between intermediate 
clusters is greater than a specified minimum distance. 

26. (Original) The computer readable medium of claim 22 wherein the discrete 
attributes are Boolean and the instructionsdetermine similarity between configurations 
based on a distance between bit patterns of the discrete attributes. 

27. (Original)The computer readable medium of claim 22 wherein the instructions 
identify configurations by tabulating data records having the same discrete attribute bit 
pattern and combining the data records from similar configurations before clustering 
the data records so tabulated based on the continuous attributes. 

28. (Original) The computer readable medium of claim 22 wherein the clustering of 
records from a configuration based on continuous data attributes produces a variable 
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number of the intermediate clusters for each configuration based on the number of 
records in said configuration. 

29. (Original) The computer readable medium of claim 22 wherein the instructions 
maintain a list of records of each configuration as data records are accessed from the 
database. 

30. (Original) The computer readable medium of claim 22 wherein the instructions 
cluster records within a configuration based on the continuous attributes of records 
within that configuration using expectation maximization clustering of the continuous 
attributes. 

31 . (Original) The computer readable medium of claim 30 where records are assigned 
to a single cluster during the expectation maximization clustering process. 
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